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SUMMARY In the x-ray crystallographic structure analysis of macro- 
molecules containing thousands of atoms, much success has been 
achieved during the last decade by the application of multiple 
isomorphous series and anomalous scattering methods. With these 
techniques the attempt is to obtain information regarding the missing 
relative phase angles of reflections by converting phase information 
into measurable amplitude information. This is achieved by measuring 
amplitude change that occurs to the structure factor vector when a 
scattering vector of known amplitude and phase is added. It is seen 
that simple relationships exist relating the changes in amplitude to 
the unknown plase angle. In practice this is achieved by preparation 
of heavy atom isomorphs of the crystal and measuring the diffracted 
intensities. A proper application of this method presupposes 
knowledge of the locations and scattering properties of the heavy 
atom substituents and it is seen that these parameters can also be 
obtained and refined with the use of the diffraction intensities 
themselves by the application of well established crystallographic 
techniques. 


Introduction Developments in direct methods during the past decade 
has made the structure elucidation of medium sized molecules contain- 
ing a fifty atoms or so a fairly routine matter. However, many 
macromolecules that are being studied at present by x-ray diffraction 
have molecular weights in the tens of thousands of Daltons. The 
complete description of their three dimensional structure involves 
the knowledge of the positions of a few thousand atoms. The 
experimental and computational difficulties are complicated by the 
fact that very few macromolecules give diffraction data beyond 2A° 
resolution. Even with the best available crystal, this lack of 
resolution makes the ratio of the number of observations to the 
number of atomic parameters to be determined to be of the order of 
two or three. Hence, at present the direct statistical methods of 
obtaining phase information which made use of the over abundance of 
observations compared to the number of atomic parameters that 
occurred in the small molecule crystallography has been of very 
limited use in macromolecular studies. The successful structure 
determination of these large molecules till now had depended on the 
use of other indirect information regarding certain known structural 
features in the crystal and their scattering behaviour to help in 
the application of the x-ray diffraction methods. In particular, the 
tremendous successes in the determination of globular protein 
structures during the last decade came through the use of the 
isomorphous replacement and anomalous scattering methods. 


ciple of the method To build up a picture of the unit cell 


! 
Contents of the crystal one needs not only the measured structure 
‘ 
4 


c 4 a 
amplitudes of the reflections but also their relative phases. The 
X-ray diffraction measurement, however, yields only the amplitude 
of the scattered wave but not its phase. Hence, any experimental 
Strategy for obtaining the relative phases of the reflected waves 
are in ¢ me tho of converting phase differences into measurable 
apliteude ad erences. The principle is illustrated in Fig. 1. 


Fig. 1 


Let us assume that the vector OA represents a Structure factor of 
known amplitude but undetermined phase angle 6. In order to 
determine 6, we add to OA, a probe vector AB, whose amplitude as well 
as phase angle m are known. The resultant vector is 9B the 
amplitude of which can also be experimentally measured. By using 
the probe AB we aim at obtaining information regarding the unknown 
Phase angle 6 of OA interms of the measured amplitudes OA and OB as 
well as the amplitude AB and phase angle @ of the probe vector AB. 


€S only the magnitude of the 
In other words we do not know 
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Knowing the phase « of AB, the phase angle 6 of OA can be either of 


the two values given by 
6 =a@t @ ..--4- (2) 

The above equation indicates that the amplitude change occurring 
on addition of a known vector gives information only about the 
component of the unknown vector in the direction of the probe vector. 
By probing QA by two known but non-collinear vectors and observing 
the resulting amplitudes, one should be able to obtain the phase angle 
of OA unambiguously. 


This is the principle used in the experimental solution of the 
phase problem in macromolecules by the use of multiple isomorphous 
series(1) and anomalous scattering methods 2,3). In use of the 
jsomorphous series method, intensity data from the parent crystal 
as well as from isomorphous derivative crystals in which a few add- 
itional heavy atoms are bound at specific sites on the macromolecule 
are made. The probe vector here is then the scattering vector from 
the additional heavy atoms and is known if the location and scattering 
parameters of these atoms are also known. In the use of anomalous 
scattering techniques one exploits the fact that the heavy atom 
substituents in the derivative crystals are in general anomalous 
scatterers and for suitable wavelengths give a component which has a 
phase advance. The out of phase component results in the structure 
amplitudes F (h) and F(h) of the Friedel pairs of reflections being 
unequal in the non-centrosymmetric crystals, a difference which, 
though small, can be experimentally measured. In the anomalous 
scattering method it is this out of phase component that acts as the 
probe vector and yields phase information from the Bijovet differences. 
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The Fig. 2 shows the relationships between the various scattering 


vectors involved and their amplitude relationships are shown in 
Fig. 3 where the vector triangle has been reflected on the real axis, 


Fig. 3 


Here also two solutions corresponding to construction of the phase 
triangles on either side of the Probe vector are possible and are 
symmetrical with respect to it. However, unlike the isomorphous 


replacement case, the two solutions correspond to different values 
for the amplitude Fp of the native protein. 


The evaluation of the protein phase angles and their refinement 
will be the subject matter of another lecture and hence will not be 
discussed here. However, it may be mentioned that in practice the 
determination of the heavy atom parameter and protein phase evalua- 


during any actual analysis these two 


interleaved cycles; the phase angle 
ing the accuracy of the heavy atom 


steps proceed in a series of 
knowledge at any stage affect 
parameters and vice versa. 


Location of heavy atoms. 


Heavy atom scattering vector, 
The first step 


: . in applying the isomorphous series method is 
the determination of the nature and Position of the heavy atoms in 
the derivative crystals. Once these Parameters are known the probe 
vector Fy, can be calculated both in magnitude and direction. If 

these atoms are also anomalous Scatterers then both the individual 
atomic scattering factors £+ and the total heavy atom structure 

factor F,, are both complex numbers, They can further be split into 
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component in phase with the incident beam and another out 
n € can write 
' n 


mponent. Thus, w 


£.= ££, + if. 
and J J : : 
(h) =F (h) +F (h) cates eae (3) 
where “a : ~ q~ 
F = oT Ie 
Fy(h) cf (h)exp 2ni(h.r.) 
33 rr: J 
> Fa e H ee ee (4) 
and 


F'(h) = i 5. (nh) ex 2ni(h.r. 

<a , i wes 

F" e i(ay + w) 

= F.. e *"H iickaanna tll 


In the above equation 
for which the values are given, SE: 
£' and £" are the real and imagina 
scattering factors. Knowing r. and £. of all substituent atoms, the 
phase angles Gy and ay corresponding to the components F and F,, can 
be evaluated from equations 4 and 5. 


We shall now consiver how the 
Positions of the Substituent atoms can be obtained, 


S R defines the reciprocal lattice vector 


the position vector of atom j, 
ry components of the atomic 


The structure factors Fy, Fp and F py; of the heavy atoms, the 


Protein and the derivative are related by the vector equation. 


_- = Epy - Fy = PPH - ep nse eawa (6) 
The situation here is the converse of w 
the aim is to obtain the difference vector Fy from the observed 
amplitudes of F and F,. From an estimate of Fur we can obtain 

the nature and location of atoms which give rise to Fy by Fourier 
methods. However, the magnitudes of Fpy and Fp at best gives only 

the length of Fy and hence the atomic parameters themselves must be 
obtained fxne thake by the application of Patterson or direct methods, 


hat is shown in Fig. 1. Here, 


i le OAB of Fig. 1 we get 
Length of From the vector triang 
2 Eu OB = OA COS AOB + AB COS ABO ......(7) 
#8 the length of vector OB. Considering the phase triangles occur- 
ring in fl case of isomorphous series oe anomalous scattering 
Sifferences and neglecting small terms (516-7) we can write 


7th) Fo (bya Fy (h) COS (apy - a) .........(7) 
—, " -a ") eee 
Poy) - FpyB)=2Fy (hb) CoS (aa, 


i in terms of the hea 
hi d amplitude differences 
vite a i a angles of derivatives. Knowing all the 
ce a above equations imply a complete solution of 
parame ‘ 
the phase angle oye 


i i he substituent 
ifi i an be obtained if allt . 
Purther simplification c 
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atoms have the same (£"/£') value which will result in the phase of 


the imaginary component being ™/2 in advance © 


£ the real component 
resulting in 


"= q- 1/2 
On ; fe y 
2 F H 


and leads to the amplitude differences expression 


1 
4 Fiso = F - = -a! 
is ~ ro 2, cos (a, an) 


and (9) 
me | “ - - 1 . _ai® 
A Fano k (Fb) Foy VR)! Fy sin(a,, a) 
we also get pi . 2 2 1/2 
ey * (A Figg * Fano! 
_ -1 i] i] 
Apy = cos (A Fs so/ Fy) + Oy ao) 
-1 


-_ MH ' ' 
= Sin (A Banat? a + Chas 


The heavy atom scattering v 
the isomorphous series and anoma 
as 


ector can be represented in terms of 
lous scattering amplitude differences 
F, = (AF. -idAF i 

H ( iso ano! exXP) SH 
from which it is seen that the heavy atom configuration may be 
obtained as a Fourier series from the measured differences if the 
phases 0& are known. However, in the initial stages of the analysis 


no phase Snformation may be available for this method of locating the 
heavy atoms. 


Heavy atom vector maps. The amplitude differences A Fiso 
may be either positive or negative for the different refle 


their magnitudes can be large only if the corresponding heavy atom 
vector F.,, is also large. For centro symmetric projections the 

relationship between heavy atom scattering vector and the structure 
amplitudes are straight forward and hence the earlier studies were 
mainly restricted to this small class of reflections. However, if 


accurate anomalous scattering measurements are also available, then 
good estimates of the length of heavy atom vector Fy could be 
obtained by appropriate combination (6 of the Wi trerences- 
Vector maps 


computed with the square of the amplitude differences 
have been used by many investigators (9,10,11) and has becom 


and A Fano 
ctions but 


e one 2 
the first essential steps in any protein structure analysis. Tt can 
easily be seen from the simplified expression given in equation g that 


2 2 
[4 Fy. ial cos* (a -a_) 


2 1 
of equation (12) shows that the first term 0 


n the ri 
hand side is the appropriate coefficient for computing 
map © 


Inspection 


f the heavy atom structure and yields positive peaks of neigh® 
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+ lacating " su 2per 
1/2 £,, £,, at locations *(r, r.). The second term, however, depends 
ae. = he Weavw -=--7. -_>. . 
not onay O% She Heavy atom configuration, but also on the protein 
2 etribnt i 3 d 
density ¢i5tripution ang hence leads to a background of positive and 
. 


Vv 
a ce lea 

ch tends to mask the features of the heavy atom 
1 


maps could be constructed from the 


Te. which also can be shown to consist 

3 S t S superimposed on fluctua ting background of 
positive and negative peaks in a way anticomplementary with'°?) those 
occurring in the isomorphous series map. Hence, appropriate combina- 
tions of the two differences are better in terms of cancelling 
unwanted background features but accentuating the required heavy atom 
vector peaks. In using the amplitude differences, it is essential 
to remember that these are in general small differences between two 
large measurements which are subjected to experimental errors and 
henc fe) 


e appropriate error analysis and weighting procedures are 
essential. Systematic errors due to lack of isomorphions, absorption 
of x-rays etc. should be carefully evaluated and sometimes local 
scaling (13) procedures may significantly improve the results. In 
general, the vector maps of anything but simple groupings of a few 
heavy atom substituents are difficult to analyse in an essentially 
complete manner. 


In most macromolecular structures investigated so far, the 
Calculation and inspection of vector maps to locate heavy atoms have 
been one of the first essential steps in the successful analysis. 

Both (A Fiso)2 and (4 Fano)? as well as suitable weighted combinations 
are usually examined. Some proteins contain heavy atoms like iron, 
which show appreciable anomalous scattering when irradiated by x-rays 
of ordinary wavelengths. In such cases it is possible to use 

(A Fano) 2 type maps using the native protein data to locate these 
atoms by accurate measurement of Bijovet differences as was done in 
the case of calf liver cytochrome bs. 


Correlation of heavy atom positions. While the relative configuration 
of the substituent atoms in a single derivative can be obtained from 
vector maps, in many space groups their co-ordinates are not given 
with reference to a unique origin. For using erating isomorphous 
Series for protein phase evaluation, at is essent a at = the 
heavy atom co-ordinates in the different picalaag ts cape erred to a 
Common origin in the unit cell. In space groups li : P ee P6, where 
the choice of origin might involve a variable erate i a tik addi- 
tional complications may arise. Many vector metho ae een pro- 
Poseg (Sop toe kay to do this and here hbeaiaagis iis ion - 
{somorphous derivative and anomalous scattering data can greatly 


Simplify the interpretation of the results. 


How if approximate initial protein phases Gp are available, 
ever, 


lous scattering 
bay us series and anoma 
pier aaa nae aces at the substituent heavy atoms in the 
5, en the 
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above equation being Chosen Which 
between [F Jobs and |r leale. This method, however, in addition to 
using only a very Small part of the data was also not able to refine 
some of the Parameters, Such as the Y Parameter in space group P2). 
Use of anomalous Scattering information can give Xe} a better estimate 
of |P Jobs against Which the heavy atom Parameters could be refined 
from Ene expression 


Gives the smaller difference 


i: 
< = P 2 2 
IF el ops (4|Fiso])? 5 lr llr la 41 - (4Pano|Fp) “}1/2) 
j rere ee | 
which gives the length Of the heavy atom vector in terms of the 
measured quantities, Fo 


refinement it is essentj 


ange of 25 to 45% and as such is considerably 
in small molecule studies. The actual agree- 
rticular case depends on accuracy of data, 

avy atoms, resolution of data, degree of iso- 


morphism between native Protein and derivative, molecular weight of 


protein etc. 


Another refinement Procedure (18) 
calculated from the remaining derivativ 
closure. 


is to use the Protein phase 
es in defining a lack of 


Ej = F py (0bs)- Fpy(calc) 


and minimizing Lwé 
the sum over all reflections by usual least squares methods. Here 


re]2,1/2 . 
Py te e 
the weight to used for each reflection is w= 1/8814 i.e. the 


inverse of root mean square of the lack of closure over all derivae 
tives used in calculating the protein phase angle. Different kinds 
of reliability indices have been suggested end have — found 
helpful in following the course of parameter refinement. 
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tee 


+) ee 7 
magnitude and phase is 


AB of Known ‘sultant On 

9 The probe vector AB. ¢ sa, ani produce the resulta : OB. 
= . ~- Ve Py ) os 

4 ad ed to unknown vector +t snd AB can be obtaine d from 
Ancle © between vectors VA at = - 

? ~ . ~= the three \ ctors. 
the magnitudes of the 
fac vectors of 
structure factor 
shin betwee the struct aie ; 

2 Re lationshig qeogeny sh cet The amplitudes of Fh) 
Friedel pairs of reflections. ; — “ 
oe n ceneral, different if there are 
and F(h) are, in general, —— 

tn pen scatterers in the unit cell. 
anomalous scatterers 
" a - 

3 Effect of out of phase component F on the padmals 
G<ace€ce of out Se ‘ HO = ae - 
amplitude F(h) and F(h). Angle between F(h) H an 
be a itudes. 
be calculated from these amplitu 
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EXERCISE 


The following diagrams represent sections from Patterson maps 


for a ribonuclease A platinum derivative to a resolution of 3%. 
The relevant information 


is given below: 


a = 30.13 space group P2y 
b = 38.11 

c = 53.29 Z=2 

B = 105.75 


The coefficients used for calculating the maps are 


2 
a, (rogl-lF oD A 


)? 


b, CF yl -lF pal 


c, W 


B 


1 x At Wo x B where Ww and Wo are 
weights. 


The maps are drawn as sections perpendicular to b-axis for 


c/2 horizontal 
a down 


oO 
(°) 


Two sets of maps corresponding to ¥ = o and Y = 1/2 
all main peaks in the maps. 


show 


Find the number and locations of the heavy atom substituents. 


al 


1/2 


Section y 


Section y = 0 
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Summary 
A least-squares atomic parameter refinement method is described 
which makes use of the F 


ast-Fourier transform (FFT) algorithm at all 
stages of the computation. Therefore, the computational requirement 
is proportional to NI N, where N is the number of reflections, making 
*ge structures such as proteins. The method 
has a radius of convergence of approximately 0.75A, making it attrac- 
tive for a small structure also. 


Log 
iw very attractive for la 


L 


Computational and programming con- 
siderations are described. Results of using the method on several 
structures are summarised. 


Introduction 


Recent results on the refinement of protein crystal structures with 
high resolution data (Huber, Kukla, Bode, Schwager, Bartels, Deisen- 
hofer & Steigemann, 1974; Freer, Alden, Carter & pena ee 
& Kretsinger, 1975; Adman, Sieker & Jensen, 1975; Deisenhofer & 
Steigemann, 1975; Bode & Schwager, 1975; Chambers & ena i. 
Takano 1977; Isaacs & Agarwal, 1978) have shown that He emene 
Markedly improves the accuracy of the ahr ek sue S aeliee i a 
aint ron density map. With the exception of rubre ee ie ams, ’ 
Sieker, Herriott & Jensen, 1973) all the refinements oe pee aaa 
to 1978 were performed using either the xea! car nN ra 
(1971, 1974) or difference Fourier methods (sce ee es diffe - 
P2ugh et al. 1973), Rubredoxin has been seinen sees or 
Tence-Fourier methods follawed & eee this result, and the 
— using a conventional Pr oar agdarly the water structure) which 

rE eteccteral information showed the validity and value of refining 
Teeult from this refinemen 


thods. 
Protein structures by least-squares me 
- res refine- 
tine use of least-squa 
cles to the rou 


i i i ata rela- 
aucit of diffraction 
e (1) vii f y 2 i 
and v4 he enormous computing cos 


The principal obsta 
ment for protein structures 47 
tive te that for small molecules, 


a 


eign ts Wisi can 
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Summary 


A least-squares atomic parameter refinement method is described 
which makes use of the Fast-Fourier transform (FFT) algorithm at all 
stages of the computation. Therefore, the computational requirement 
is proportional to NLogN, where N is the number of reflections, making 
it very attractive for large structures such as proteins. The method 
has a radius of convergence of approximately 0.754, making it attrac- 
tive for a small structure also. Computational and programming con- 
siderations are described. Results of using the method on several 
structures are summarised. 


Introduction 


Recent results on the refinement of protein crystal structures with 
high resolution data (Huber, Kukla, Bode, Schwager, Bartels, Deisen- 
hofer & Steigemann, 1974; Freer, Alden, Carter & Kraut, VOTE! Moews 
& Kretsinger, 1975; Adman, Sieker & Jensen, 1975; Deisenhofer & 
Steigemann, 1975; Bode & Schwager, 1975; Chambers & Stroud, 1977; 
Takano 1977; Isaacs & Agarwal, 1978) have shown that refinement 
markedly improves the accuracy of the structure and the quality of the 
h the exception of rubredoxin (Watenpaugh, 
1973) all the refinements listed here prior 
either the real space method of Diamond 
rier methods (see for example Waten- 
n has been extensively refined with diffe- 
ak l least-squares refine- 
Tence-Fourier methods followed by block-diagona’ tea 
Ment using a conventional program. The napa poner ee ca 
®xtra structural information (particularly ace Eeaneree ie 
result from this refinement showed the We. y 
Protein structures by least-squares me . 
i e of least-squares refine- 
. ee uateity of diffraction data rela- 
pe (2) the enormous computing cost, 


electron density map. Wit 
Sieker, Herriott & Jensen, 
to 1978 were performed using 
(1971, 1974) or difference Fou 
Paugh et al. 1973), Rubredoxi 


The principal obstacles to t 
Ment for protein structures are 
tive to that for small molecules, 
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isti st protein crys~ 
aracteristic of mos 

The paucity of diffraction data is waa chan fe pedueay bes uae a: _ 

; iff radiation is re 

The intensity of the diffracted radi is further re- 
tals. The inte nsity of ie, wonthemun be 

a » extent of the sc 

large unit cell volume, and the t h asieus hae pall, Feticoen. 
duced’ by the large amount of disorderec I 


‘ Leo ty: “Phe 
resolution of 1. 
tein crystals have had data measured toa we ment reduces as the 
reliability and accuracy of ies ese - aie of parameters redy- 
: : ; » numbe 
: “r bservations to the n : 

ratio of the number of obs : ction data to a rego. 
ces. For practical purposes this means that diffra re eas 
res. 1 : ; tne, . 
lution of > 2 A is required for a meaningful refiner 


ture studies may be able to give high resolution nar cree ee 
proteins. The computing cost arises oon the num ier (ee 
required for least-squares refinement. For a full ee ornae : i 
calculation the computing required is proportional to I . 

the number of reflections and M the number of perameters. ‘OF the 
simplest diagonal least-squares calculation, the requirement are 
portional to NM since a de rivative of each calculated structure factor 
has to be computed for each variable parameter. 


It is possible to circumvent a lack of data by either reducing the 
number of parameters or by adding additional observations into the cal- 
culation. The number of parameters may be conveniently reduced by 
treating groups of atoms as "rigid bodies" with their positions described 
by three rotational and three translational Parameters, Alternatively, 


the number of observations may be increased by including information 
in the form of constraints On the known ge 


valence angles) of peptides, Least-squar 


which use one (Konnert 1976) Sman, Holbrook, Chruch & 
Kim, 1977) of these approaches, or use least-squares coupled with 
potential energy minimization (Ja - They have been 
used with remarkab] y low resolution 
data is available, but both the 

are affected (altho 


ure (Schmidt 
Jack and Levitt (1978) use 


Ourier | Z 
(Agarwal 1978), Cane 


This Problem of Cost re uir 
. eda 
refinement, and this was eae 4 hew @PProach to least 


and Tukey 1965; Te 
tremely fast, a 
where N ig the 


Winograd 1978) the algorithm is e*~ 
e Ured is Pro 


N 
reflections, Portional to N log 
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ie method and results of its testing on several 


ned in Agarwal 978). Details of its application 


structures are conta 
. 
1 


contained in Isaacs and Agarwal (1978). Although 
most useful for large structures, it is applicable to small 


— a: e > y 
cause of its large radius of convergence ( 0.75 A) and 
ed computational requirement. 


Cc Results of its application toa 
small structure are discussed in Agarwal (1978). Since the method has 
been discussed in detail earlier, we will confine ourself to a tutorial 
discussion. 


The Method 


In the least-squares refinement of atomic parameters the function 
minimized is 


Pas 2 Wome ll Feqnay! - | Fobs (het) |° 


where Wy, is a weighting function. This function is to be minimized 
with reference to atomic parameters. The corrections to the para- 
meters are obtained from the matrix equation 


Aue -u7!c 


where Au. is the correction to be applied to the ith parameter 
u-} is the inverse of the normal matrix whose general term is 


N alFetr)l al Fer) 


Haj = L We dPi dP; 
r= 


where N is the number of reflections and Wy is a weighting function. G 
is the gradient vector (derivatives) of general form 


N 
2 | Fe(r)| 
G,= ) W,(4 F(r)) 3Pi 


r=l 


The size of the normal matrix is MxM where M is the number of para- 
t and the length of the gradient vector is M. The calculation of 
a gradient vector is proporiiaua! to NM and that of the normal matrix 

is proportional to NM™. 


here are three major computational steps in the refinement pro- 
; = apenas are calculation of structure factors, the gradient vector, 
: ae matrix and ite inverse. We briefly discuss how these can 
eno 


be calculated using FFT. 
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on = > Sc ctors 
Calculation ol Structure Factors 


F I fa model electro de 
a by a } O ; 
calculated ; n 


Structure factors are The use of the basic method y 


cedure. oe 
A aa i Seas eens thirty years ago. However, it is only 
135 ssed bv Sayre (1951 nearly / > become viable Fos 
discusse ; am aca at FFT that the method has ne ae 
ee : ; > > two distinct stages in 
' | tures Computationally there are star 
protein structures. first stage is to calculate the ato 
2 s ture factors. The fir | 
calculating struc ture at each point on a uniform grid paral. 
“tr »nsity » struc | | 
electron density of ie, ond is the Fourier inversion of this elec. 
> ce ces. e sec 3 
lel to the cell axes btain the structure factor magnitudes and phases, 
; density map to obtain tne : 
tron density ae he first stage depends on the fineness of 
he computation required for the fi i — : i 
= numbe r Olnts, 
the sampling interval throughout the cell, or the g p 


and the distance from the centre of each atom for which the electron 
density is to be computed. 


as 


The most expensive part of the structure factor calculation is set- 
ting up the model atom electron density which is the Fourier trans- 
form of the atom scattering factor curve corrected for thermal motion. 
The isotropic thermal motion of the atoms is represented as a Gaus- 


sian function exp(-Bys?/4) where s is 2 sin@/A. If the atom scattering 
factor curv 


e is also defined asa Gaussian function then the product of 


these two Gaussian functions is another Gaussian function whose Fou- 
rier transform is also Gaussian. 


Both Ten Eyck (1977) and Agarwal 
(1978) have given the formulae to calculate the atom electron density. 
The speed of this computation depends on the number of terms in the 


ange of sin@), 


limited to low resolution, 
for higher resolution data 


the radius of the atom will 
calculation, 


an function, for data : 
(Agarwal 1978 or three Gaussian ter™ 

’ ing 
exclud Ten Eyck 1977). Reduci 


8iven radius depends an 
tin (OF+ Agarwal (1978) has comp" 

8 radii and temperature factors 5° ' 
be chosen. 


Calculation of the Gradient Vanes 
—aeenn ae he Gradient Vector 


Agarwal (1978) has dani 
: erived the . di- 
ent vector with Tespect to —— cose expression for the gra 
e of the mth atom 


fi 
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G(x,,) = ) Ban (s)(-i2th)W(s)E(s) exp(iP(s)) exp(-i2rs.r.,,) 
where 
gn(s) = fms) exp(-Brys“/4) = contribution of mt atom to structure 
factors 
(s) = 2 sin6/} 
W(s) = Wihkl) 2 weighting function 
E(s) = |Foatcnki)! - |Fobs(hkt) | 


S.P., = bXyt+kymtlzm 
G(s) = phase of Foatc(hkl) 


Similar expressions hold for G(y,,,), G(z,,) and G(B,,) with the term 
(-i2th) replaced by (-i2tk), (-s*/4) respectively. 


G(x,,) may be rewritten as 


G(x 


m) ” D,.(8) gm (s) exp(-i2m7s.r,,) 
s 


where D,(s) = (-i2wh) W(s}E(s) exp(if(s)) 

G(x.) then, is the Fourier transform of the product of two functions 
D,(s) and g,(s) evaluated at rpm (the position of the mth atom). Accord- 
ing to the convolution theorem, multiplication im reciprocal space is 
equivalent to convolution in real space. The Fourier transform of Em(s) 
is the electron density of the atom Be, (tT) and the Fourier transform of 
D,.(s), which we shall call d,(r), is a modified difference density map. 
The gradient then is computed by the summation 


Gltm) =] dxtt) 0 m(t-tm) 
Tr 


The computation of all the x derivatives requires the calculation of the 
Modified difference density map, dx(r), by FFT followed by an integra- 
tion of d,(r) with the electron density function for each atom. If the 
atom electron density is assumed to be zero outside = radius rad,, from 
the atom centre mm, the summation need only be eaewied out within this 
radius for each atom. Separate difference density functions have to be 
computed for gradients with reference to y, z and B. 


trix 
Calculation of the Normal Matrix 


ssions for the normal matrix term H(x,,, xy), 


‘ re 
The following €*P tions between x,, and x, have been derived. 


corresponding to interac 
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H (Xm, Xp) = H](Xm: Xy) + H2(sm, Xn) 


where 


— 


H1( ) = ) se (s) en(s)(4n2h*)W(s) exp(i2ns. (Tint 
1\*m> Xn) = £ 58m 


1] 


2n2yw xp(i2 xp(-i2ns. 
Hom» Xn) = J - 5 8ndsen(s)(4n7h7)W(s) exp(i26(s)) exp(-i2ns 


wm 


(rtr,)) 


Similar expressions hold for all other elements anithe normal matrix, 
differing only in that the term (4n7h2) is replaced bya Similar a de- 
pending on the type of interaction. For example, it is replaced : 

(40 k2) for YmYp interaction, by (41@hk) for XmYn interactions, (s /16) 


for BB n interactions, and (iths2) for XmB, interactions. 


Tf Ax. (5) = 20@h2w(s), then the Fy (xm Xn) terms represent the Fou- 
rier transform of Axx(8) gm(s)gnds) evaluated at (Th-rm), the vector bet- 
ween the two atoms. Axx(8) gn(s) £n(s) is always real and positive. Its 
Fourier transform has a large peak at origin corresponding to the dia- 
gonal terms, then drops Tapidly and alternates in Sign as the distance 
between the atom increases, The Ha (ena Xn) terms Tepresent the Fou- 
rier transform of ~Axx(S) 8m(s) g,(s) exp(i29(s)) eval uated at (r,,+r,)- 
This involves phase terms 80 that, unlike A(X, X,,), these terms will 
have no major peaks i nitude distribution is likely to be the 


trix, As the major contribution to 
Comes from the H} terms, neglecting 
he final result, but Only the rate of 


ormal Matrix 
will not affect ¢ 


m: ¥,) = d Axs(s) g2 (9) 


This may be co 


™Mputed dire 
Agarwal (1978) 


ctl : ; ‘ 
: The computarig owinR the Procedure described in 


t « e " 
ber of unique Teflectiong, = “enue " a oe 
Off diagonal elements ca 
£radientg, We May write " Calculated ‘a similar manner to - 
Hi ltms xype Ta 
xx(s) 
a a *x ml) 8515) expl-iame, (ry-r)) 
Which is Similar tg the e i 
Pre Slon 
replaced by Axx(s), Bim (s) ig repla ‘or the Bradients except that D, (8) nd 
Yto-te. if @xx(T) is the F 


lac 
y &m(s) g (s) and ry, is reP 
Urier Transform, of ‘Axscls), and 2 mnlt) 


RE FIN 
ALF i 
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eT (Sa > > 2 octrne = r 
the joint au electron density function of the mth and nth atoms, 
er transform of : 
¥ ‘Transtorm of Enls) em (s), then by the convolution 
theorem 1) (Xpm.%y) 15 the convolution of - : _ ; aluate 
} c ion of ayy(r) and ¢ mn(t) evaluated at 
as the summation: 


be expressed 


ihe summation i vera 


the grid points in real space, and (r-r,+r,,) 


° l 
is the distance of the grid point from the point (r,-r 


m): If the joint 
mn(t-Tyt+ry,) is assumed to be non zero only for grid 
points within some limiting radius from the point [The Fnd> then the 
summation is much simplified. 


electron density 0 


Furthermore, if the off-diagonal terms 
of the matrix are restricted to interactions between closely related 


atoms (atoms in the same side chain or the same peptide unit, for in- 
stance), then a,.(r) is required over a limited volume of real space 
about the origin and could be computed directly. Since Axx(s) will 
change only if the weights change, the function a,,(r) may be used ina 
number of refinement cycles until this happens. Similarly, the joint 
Gaussian electron density function P anl?-Tntlm) will change only if 
(r,-r,,), Bm or By changes. The calculation of other off diagonal 


terms is similar with A,, replaced by the appropriate function as given 
above. 


Programming Considerations 


Although the algorithm may appear complex, sa Ag gras is 
relatively straightforward. Agarwal (1978) has aia eda the pro- 
cedures, The two largest computations are os ied i a 
forms and the modeling of the atom Bectron dens = pce se oe 
calculations it is important that Bs ae peeess on ns e 
done, not only to save computer time but also to s ge. 


(a) The Fast-Fourier Transforms 
e transforms are calculated fora 

The nature of ee seer oe, set of data. Ten Eyck (1973) 
Complete unit cell, w " t symmetry may be utilised to reduce both the 
has shown, however, tha 4d the amount of data required sai in — _ 
ao een Oh -eOrep ee See ckaie of programs to eat ine pane paeie we 
a ge dnd gga? gaa wlay alee be maCe wen ” 
‘Ons. Savings in tl 


flections. 
tematic absences in general (hkl) refle 


(b) Modeling the Atom Flectron Density 
oO 


nsive part of t 
ptimise the pt 


he calculation and it is impor- 


i he use of single 
This is the most e*P© ogramming.- The g 


oO 
tant that care is taken t© 
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and double Gaussian approxi mations to the atom oa density has 
- ; ; 978). 
been discussed by Ten Eyck 41977) and Agarwal (197 ) 


The only difficulty in the programming is to allow correctly for 
atoms which lie close to the edges of the asymmetric cell unit, This 
may be approached in two ways. In the first method, which WSR Site 
ployed in the original program used for the insulin refinement (space 
group R3) the coordinates of each atom are transformed by the Space 
group symmetry to lie within the cell asymmetric unit required for the 
transforms. If an atom extends outside this asymmetric unit, then this 
density is added to the symmetrically equivalent point within the asym- 
metric unit. This method is appropriate for a computer with a large 
virtual memory where the whole asymmetric unit may be considered to 
be held in core. In the case where only a small slab of density can be 
held in core this method is not efficient, since each atom in the struc- 
ture has to be moved through each of the symmetry Operators to deter- 
mine if any part of its volume falls within the required slab. In the 
second procedure the initial atom coordinates are transformed through 
all the symmetry positions and a sorted list of those atoms which will 
have some density within the required asymmetric unit is retained, For 
any slab of density only those atoms contributing to the slab are used. 
The disadvantage of this system is that the atom list is very much ex- 


tended and the exponential factors for duplicate atoms need to be com- 
puted a number of times. 


The calculation of the gradients uses a similar routine except that 
premultiplication of the WAE values by -ih, -ik, -il may change the 
symmetry of the modified difference map. In R3 for example, pre- 
multiplication by -ih or -ik destroys the three fold symmetry around 
the origin which means that in convoluting the atom electron density with 
the modified difference map, 


special care has to be taken with atoms 
which extend over the edge of the asymmetric unit. 


overcome by using an expanded asymmetric unit for the modified diffe- 
rence map, which extended in both Positive and hegative directions on * 
and,y by a distance greater than the maximum atom radius. This is 4 
cumbersome procedure as it requires additional computer core to hold 
the section of the map and does not lend itself to producing a space group 
general routine, An alternative solution adopted by Eleonor Dodson 


(Baker & Dodson, 1979) is to use a Symmetry expanded set of atoms: as 
for the structure factor calculation 


This problem was 
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he space groups Pj, P2)2)2), P4),2)2, R43, 


=? 


to use the FFT least-squares certain conditions with re- 
lution of the data set, the accuracy of the starting co- 

ze of the computer have to be met. The accuracy 
refined structure depends on the resolution of the data used, 
the higher the resolution the more accurate will be the structure. Ge- 
nerally, diffraction data to a resolution of at least 2A is required fora 
meaningful refinement. However, at the beginning of refinement, when 
the coordinate errors are large, high resolution terms should not be 
used and a refinement with data to a resolution of less than 2A may 
produce some improvement in the model structure. 


Test calculations (Agarwal 1978) have indicated that the method is 
capable of correcting coordinate errors with an rms value of 0.75 ti 
Obviously, a protein structure with this degree of error in the coordi- 
nates would not have the geometry expected for peptide units and it is 
likely that most model structures fitted to electron density maps will 
have smaller rms errors than this, although some individual atoms 
could have much larger errors. In both insulin (Isaacs & Agarwal, 
1978) and actinidin (Baker & Dodson, 1979) the refinement was able to 
correct automatically coordinates which were in error by 0.5A on ave- 
rage. In actinidin, the starting coordinates for the refinement were 
those read from a model fitted to a 2.8A mir map, whereas for insulin 
the coordinates were read by inspection from a 1.5A map, phased by the 
Phase refinement method of Sayre (1971, 1974; Cutfield, Dodson, Dod- 
son, Hodgkin, Isaacs, Sakabe & Sakabe, 1975). It now appears that if 
the lower resolution map is of sufficient quality, there is little to be 
gained by extending it to 4 higher resolution in order to improve the co- 


ordinates prior to refinement. 


grapher, the computing requirements of the 
The program written by Dodson is flexible 
ments, and for the actinidin refinement (1820 
d 35K words of store. It does need back up 
: d operate with magnetic tape files. 
Btore, bly on a disc, but coul file 
The ¢ prefera ee ,ment, both in computer fime and manpower, is its 
wesc of the re a complete refinement of actinidin, fram a set 
of est attraction. ; 3 2.8 set of refined coordinates with 
Coordinates read from : ee i wad waw Gorn 
-7A data took about 14 hrs of 


Pleted in only three months. 


For a protein crystallo 
Program are very modest. 
nits core storage require 
atoms, 24000 data) require 


map to a 
computer time ona 
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the “MZ +> a 
saace and Agarwal (1978) and Baker and Dodson pare eee oe 
ir experiences in using the method to refine insulin and acty- 
n respectively. Generally these experiences and those of Hardman 
myog nd carbonic anhydrase (pe rsonal communication) are 
2 r and it seems that differences in the size of the p roblem do not 
n he n re of the refinement. The major difficulty with the 
s the fact that the shifts calculated for geometrically related 
ms, such as those forming a peptide unit, destroy the geometry. 
: jour is characteristic of protein refinements where atoms 
re allowed to move individually. Causes of this might be the large tmt- 
sl errors inthe coordinates, the relative sparseness of the data with 
fewer than three observations for each variable parameter, and the 
neglect of atom-atom interactions in the normal matrix. This loss of 
geometry may be controlled using a program of the type written by Dod- 
son, Isaacs and Rollet (1976) or Ten Eyck, Weaver and Mathews (1976) 
correct the gross structural irregularities. Although this method of 
correcting the geometry every few cycles decreases the rate of conver- 


»re is no evidence so far to suggest that it adversely affects 
the accuracy of the final model. Test calculations performed by Sten- 
kamp and Jensen (1976) with simulated protein data support this. A 
faster rate of convergence could be achieved by incorporating the geo- 
metrical constraints in the least-squares equations (Konnert 1976) but 
an advantage in separating the least-squares refinement and regularisa- 
tion procedures is that large shifts on regularisation often indicate 
gross errors in the structure. 


The experience of Isaacs and Agarwal (1978) and Baker and Dodson 
(1979) with high resolution data indicates that the gross errors in the 
model may be corrected by refinement with data to a resolution of 2A. 
The use of the weighting scheme proposed is important in placing most 
weight on the low angle terms for these early cycles. Baker and Dodson 
(1978) found that the initial seven cycles of coordinate refinement with 
24 data on actinidin produced an average shift of 0.43A for main chain 
atoms. The remaining 2] cycles of coordinate refinement with the inclu 
sion of data to 1.74 produced an average shift of 0.18A for the main 
chain atoms. Much of the manual labour required for a refinement 18 
spent on the poorly defined regions of the structure and on the solvent 
structure. The well ordered solvent should be included in the model 
structure as s00n as possible, but with regard to other solvent and to 
disordered structures it is wise to Proceed with caution. Isaacs and 
Agarwal (1978) have discussed how incorrectly assigned solvent (wate?) 
rnolecules confused the interpretation of 4tfidtiece -F ourier densities ° 
sore side chains, particularly of glutamic acid and arginine residues: 
The solvent structure can be unravelled (Watenpaugh, Margulis, Sieke? 
& Jensen, 1978) but to do so requires considerable Hak 


= lianas ee ree 


REF EMENT } ct F ja 
Conclusiod 
Coc 
Tah sis nur r of ructure uA ch } ‘ ee hiect re 
nemen sing . 3uare refinement programs. The speed 
‘the new aigoritnm is evident — the complete retinermment ol actinidin, 
starting with coorcinates trom a 2.8Am.i.r. phased electron density 
map, required about 14 hours of computer time on a DEC 10 and was 
completed in only three months. This work also provides a good esti- 


mate of the ra cs af sanverceneca nf Ri - 3 f ‘ 
mate of tine raeGeus o. convergence ot the method. The average shift, in 


ensition for main chains was 0.45A ga on - we a c : 
pesition for Main chains was U.4>4 and tor side chains atoms 0.56A. 


Acknowledgement 

-—_|]$ ——_————— 

This tutorial presentation is based on material written by Dr. Neil 
Isaacs of St. Vincent's School of Medical Research, Melbourne, Austra- 
lia. The author is greatly indebted to him for his contribution. 


References 


Adman, E.T., Sieker, L.C. and Jensen, L.H. (1975), Acta Cryst. A3l, 
S34, 

Agarwal, R.C. (1978), Acta Cryst. A34, 791-809. 

Baker, E.N. and Dodson, E.J. (1979), J. Mol. Biol. in press. 

Bode, E. and Schwager, P. (1975), J. Mol. Biol. 98, 693-717. 

Chambers, J.L. and Stroud, R.M. (1977), Acta.Cryst. B33, 1824-1837, 

Cooley, J.W. and Tukey, J.W. (1965), Math. Comput. 19, 297-301. 

Cutfield, J.F., Dodson, E.J., Dodson, G.G., Hodgkin, D.C., Isaacs, 
NLW.. Sakabe, K. sud@@MN (1975), Acta Cryst, Al, 821. 


Deisenhofer, J. and Steigemann, W. (1975), Acta Cryst. B3l, 238-250, 


436-452. 
Diam ; 971), Acta Cryst. Azi. 
ond, R. (1971) 82, 371-391. 


Dia ‘ J. Mol. Biol. 
Diamond, R. (1974) T-Me ET Roliett, J.S. (1976), Acta Cryst. A32, 


311-315. 

Freer, S.T., Alden, R.A., 
Chem. 250, 46-54 6 
—— — GF st. A34, S65. 

Hardman, K. D. (1978), pak oe P., Bartels, K., Deisenhofer, 


Huber, R., Kukla, D., Bode, W., HESwases: Sgr 
J. and Stei «Ww. (1974), J-Mol. Biol. 89, . 
. and Steigemann, Main, P+ Woolfson, M.M. and Dodson, F.J. 


Hull, S.E., Karlsson, R- 

(1078), Nature, 275 206-207. 
International Tables for x 
~ miagham: Kynoch Pres® « Cryst. A34, 782-791 
Nee yg a Agata R.c. (1978), Acta Cryst. Al, 
’ Ch, WwW, an ’ 


Carter, C.W. and Kraut, J. (1975), J. Biol. 


llography (1974), Vol. IV, Bir- 


ia 12 | AGARWAL 
% q -935 

tack, A. and Levitt, M. (1978), Acta Cryst. A>d4, p33 +935. 

Konnert, JM. (1976), Acta Cryst. £32, 614-89": 1. @). 20)-22 

Moewe, P.C. and Kretsinger, R.H. (1975), J.Mol. Biol. 91, 201-228 

Sayre, D. (1951), Acta Cryst. 4, 362-367. 

Sayre, D, (1972), Acta Cryst. A28, 210-212. 

Sayre, D, (1974), Acta Cryst. A30, 180-184. = 

Schmidt, Jr., W.C., “Girling, R.L. and Amma, E.L. (1977), Acta Cryst. 
Wis, S018. 3620, . - 

Stenkamp, R.E. and Jensen, L.H. 976), Acta Cryst. A32, 255-255, 

Sueaman, J.L., Holbrook, S.R., Chruch, G.M. and Kim, S.H. (1977), 
Acta Gryst, A33, 800-804, 

Vakano, T. (1977), J. Mol. Biol. 110, 537-584. 

Ten Eyck, L.F. (1973), Acta Cryst. A29, 183-191. 

Ten Eyck, L.F. (1977), Acta Cryst. A33, 486-492. 

Ten Eyck, L.F., Weaver, L.H. and Mathews, B.W. (1976), Acta Crvys:. 
A32, 349-350, 

Watenpaugh, K.D., Sieker, L.C., Herriott, J.R. and Jensen, L.H. (1973), 
Acta Cryst., B29, 943-956, 

Watenpaugh, K.D., Margulis, T.N., Sieker, L.C. and Jensen, L.H. 
(1978), J.Mol, Biol, 122, 175-190, 


Winograd, S. (1978), Mathematics of Computation, 32, 175-199. 


*(QL61) ucspoq pue uosz[OOM ‘ULeW ‘uossjrey ‘ {IN 


(6261) BOSpog pue 194eq 
“uolzeotunWIWCS jeuosiad ‘q*y ‘ueupiezy 


(gL61) 1emredy 


(g261) le*1eBy pue sdeesy 


‘e 
$3800 Zuijetedo zoy es1eys & epnjour QO, OAC Fy} 103 sauny ndo * 


- - 61° - izkada O°L 206% 78 S UIpls1weIn 
= ‘i - or u “ u u n 
rI gz LI €F’ kik L*l O6£€2 Iz8l Vip wiyoy 
S 6 81° 2b 8 O°2 IS%6 0SO0Z eseapAyue diuoqieD 
€ ¢ 9I° L2° a 0°2 0006 oor Ty nuigoysoAunsp 
01 2 a ie ae lea Z71 L999 OLE Ves ed uldunaesg 
Zz €1 Ot Os* eu O'l Wot 92 1O117e]Op -1Aj29"-g 
= 7 me = u “ u u u 
$2 vb It" 82° eu S*t O68lt LLOL ulynsuy 
a 24x eee ‘dy uw eyep suloje 
a “1g -1uy] pee (y) Pp ah att aynoajow 


BS[DAD ON Sl0j}3287 ¥ 


sorenbs-isee] Laz4q Aq peuijaa sainjon138 jo sajdwiexe aui0g 


I ZIavLs 


19. 01 


PHASE EVALUATION AND SOME ASPECTS OF THE 
FOURIER REFINEMENT OF MACROMOLECULES 


M, Vijayan 


Molecular Biophysics Unit, Indian Institute of Science 
Bangalore 560 012, India 


SUMMARY 


The statistical method for the determination of protein phases from 
isomorphous and anomalous differences has been described. Methods 
for estimating r.m.s. errors and the suggested modifications to the 
classical Blow and Crick formulation have been reviewed. The Fourier 


methods for the refinement of protein structures have been outlined. A 


theoretical analysis, with special reference to protein struct 


ures, has 
been presented. 


The theory provides a rationalisation for the use of 
general Fourier syntheses with (m Fy-n F.)exp(ia,) as coefficients 
alse leads to the determination of optimum values of m, 
lated parameters. 


it 
n and other re- 
A method for the empirical determination of the 
parameters has been suggested. 


The treatment ot "inner reflections" 
affected by solvent, 


the reliability of refined coordinates and the check- 
ing procedures employed during the course of the refinement have also 
been discussed, 


INTRODUCTION 


Despite the atriking advances made in recent years in the develop- 
ment of novel methode and procedures for the x-ray study of macro- 
molecules, jeomorphous replacement, which is often used in conjunc- 
tion with anomalous ecatte ring data, remains the most important and 
almost indispensible method of phage determination in macromolecular 
crystallography. Thie centribation, therefore, starts witha descrip- 
tion of the classical phase evaluation procedures using isomorphous 
and anomaloun differences: some suggested improvements to the clas- 
gical formulation are also touched upon, 


Until recently: macromolecular crystallography has been concerned 
primarily with the determination of the gross three+dimensional structure 


of macromolecules from isomorphousty phased electron-density maps 
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; >n made during the 1: 
Several successful attempts have, however, beenn : last 


few years to refine mac romolecular structures to the Ae Riles 

; ‘ from the native crystals, Th, 
racy permitted by the total available data : sat! ot neutet - 
excellent set of articles on the high resolution refinemen protein 
structures inthe Proceedings of the 1975 International Summer School 
on Crystallographic Computing Techniques held at Pre ae eines a | 
nearly comprehensive account of the state of the work in this area as it 
then existed. There have been several subsequent developments some 
of which are discussed in other contributions in this Winter School, I 
shall be concerned, in the latter part of this contribution, with the 
Fourier methods of macromolecular structure refinement. No attempt 
will, however, be made to give a comprehensive account of the applica- 
tion of these methods. Instead, I shall endeavour to outline the methods 
employed, present a theoretical analysis, highlight some special prob- 
lems and suggest some improvements in the existing procedures. 


The crystallographic techniques developed for the structure analysis 
of proteins can be, and have been, used for the study of other macro- 
molecules as well. However, as most of the macromolecular crystallo- 
graphic studies have till now been concerned with proteins, the term 


"protein" will be used here, for the sake of convenience, to refer to 
macromolecules in general. 


PHASE EVALUATION 


Isomorphous replacement and anomalous dispersion methods 


The preparation of isomorphous protein heavy atom derivatives in 
volves the attachment of "heavy" atoms like me reury 
, 


or chemical groups containing them to Protein crystals in a coherent 
manner without changing the conformation of the molecules and their 
= bie packing. Thus, ideally, the structures of a protein crystal a" 
a derivative crystal should be identical as far as site ae d = ions 
eee concesned alsa a for the Presence of heavy oe he Vv ae 
one ee the latter. Under such eee : . ae eames 
phism, and neglecting experimental errors, the Beiiea Sanaa the 


lead or uranium 


vative F py » the structure fa 
contribution Fyy are denoted b 
pective phase angles are gq 
obtained directly from e 
the refined heavy atom 


Y Fpy, Fp ana a 


; + Of these, F and Fp 
x — , 
Perimental data. Fyy can be ot aatatl prom 
The phase angle of the prote!? 


\ 
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Figure 1 


Figure 2 


Figure 3 Figure 4 
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structure factor is then given by 

Aap = ay + yi (1) 
where cos(n-() = (Fpy-F -Fiq)/2F PF H- Thus there are two possible 
values for ap placed symmetrically about ay. This ambiguity can be 


resolved if data from two independent derivatives are available. Two 
equations like (1) would then be available. 


Ap = 447,29) 


and 


I 


Sp = on2tP2 » 

where subscripts 1 and 2 refer to derivatives 1 and 2 respectively. 
There are thus two possible sets of values. That value which is com- 
mon to both the sets corresponds to the correct protein phase angle. 
The situation can be demonstrated graphically with the aid of the so 
called Harker construction (Harker, 1956) shown in Figure 2. A circle 
is drawn with Fp as radius and the origin of the vector diagram as the 
centre. Two more circles are drawn with Fpyy] and Fpyy2 as radii and 
the ends of the vectors Fyy) and Fyy2 as the centres. Both the circles 
intersect the Fp circle at two points each. One of the points of inter- 
section is common. That point defines the phase angle of the structure 
factor from the native crystal. Thus protein phase angles can be deter- 


mined if a minimum of two independent heavy atom derivatives are 
available. 


The dispersion correction terms (International Tables for X-rav 
Crystallography, 1962) for atoms with high atomic numbers are appre- 
ciable and hence the heavy atoms in protein derivatives are usually 
anomalous scatterers. Assuming that the heavy atoms in the deriva- 
tives are the only anomalous scatterers and that all the heavy atoms in | 
any given derivative are of the same type, the relation between the 
structure factor of a reflection hkl froma derivative and that of its 
Friedel partner h kl can be represented as in Figure 3. The magnitu- 
. des of the two structure factors are denoted by F, : 


z tively: 
Fyy is the real part of the heav PH and Fpy, respec 


y atom contribution i i hat due to 
the real part of the dispersion correction and FR" including tha 


0° 
A is the imaginar comp 
nent of the heavy atom contribution, It is readil aden that ind 
Fpy could be formally considered as the struct 7 f PH givel 
retlection arising from two indepe ure factors of any & 


ndent derivatj dia- 
gram can then be constructed as shown in riptee 4, The Harker 


i T re, in 
principle, protein phase angles can be deter@ad + herefo 7 ae 
vative when anomalous scattering effects are also ran . oe 

ade use of. 
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lt is interesting to note that, for any given derivative, the informa 
tion obtained 


4 


~ 
a 


from isomorphous differences, Fpy-Fp, and that obtained 
Om anomalous differences, Phy - Ppp, are complementary. The i ao 
morphous difference for any given reflection is maximum if Kp and Fy 
are parallel or antiparallel. The anomalous difference, then, is zero 
if all the anomalous scatterers are of the same type, and the native 
phase angle is determined uniquely on the basis of the isomorphous 
difference, In general, the isomorphous difference decreases and the 
anomalous difference increases as the inclination between Fp and Fyy 
increases. Assuming Fyy to be small compared to Fp, the isomorphous 
difference tends to be small and the anomalous difference tends to have 
its maximum possible value when Fp and Fyy are perpendicular to each 
other, The anomalous difference then has a predominant influence in 


determining the phase angle. 


Blow and Crick formulation 


In a real situation, conditions are far from ideal on account of 


several factors the chief among them being imperfect isomorphism, 
errors in the estimation of heavy atom parameters and the experimental 
errors in the measurement of intensity from the native and the deriva- 
tive crystals. Consequently it is desirable to use as many derivatives 
as are available for phase determination. All the circles would not then 
intersect at a single point in the Harker diagram; instead there would 
be a distribution of intersections. Thus what one obtains is not a unique 
phase angle, but a probability distribution for the phase angle. 


The statistical procedure employed so far for deriving phase angles 
using multiple isomorphous replacement (MIR) is based on the classical 
treatment by Blow and Crick (1959). In their treatment, Blow and 
Crick assume, for mathematical convenience, that all errors could be 


considered as residing in the magnitude of the derivative structure fac- 


tor alone. They make a further assumption that those errors could be 
described by a Gaussian distribution. With these simplifying assump- 
tions, the statistical procedure for phase determination can be readily 
worked out as follows. 


Figure 5 shows the vector diagram for a reflection from a particu- 
lar derivative with an arbitrary value a for the protein phase angle. 


Referring to the Figure, we have for the derivative 


(2) 


If a corresponds to the true protein phase angle ap, then Dyz,(a) coin- 
cides with Fpy4j- The amount by which Dy;j(a) differs from Fpyyj, namely, 


—. 1 
Dy (4) = Fp + Fi 2FDF yi eaa(orsi-8)} {2 
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where N s the normalisation conatant and I ; (6 an egtimate { the 
T.™.@. error. When a number of heavy atom derivatives are available, 
sl probability of a phase angle a being correct would be 
Pi. se «tt * f e : 2 one 4 
Pia) = eP(a) = Nexp [-5 (canta) AED) | (5) 
14 


the summation is over all the derivatives. 


When P(e} for any particular reflection is plotted around a circle of 
init radius, as shown in Figure 6, the phase corresponding to the high- 
est peak inthe probability distribution would give the most probable 
protein phase a), fof the reflection. Then the Fourier synthesie with 


Fp exp(iaa,) 


as coefficients would give the most probable electron-density distribu- 
tion in the protein. A different way of using the probability distribution 
has been described by Blow and Crick. In Figure 6, the centroid of the 
probability distribution is at point P. The polar coordinates of P are 
mand ap where m, a fractional positive number with a maximum value 
of unity, andap are referred to as the "Figure of merit" and the "be st 
phase" respectively. A Fourier synthesis with 


mFp exp(iap) 


as coefficients is called the "best Fourier". Defined in this manner, 
the best Fourier would give the electron-density distribution with the 
lowest r.m.8. error. The best Fourier synthesis rather than the most 
probable Fourier synthesis ie usually employed in the structure analy- 
sis of proteins. In practice, the figure of merit and the best phase are 
calculated uging the expressions 


m cos ap = } P(a;) cos a/L P(a;) 
i i 
(6) 


and 
maing, © j P(a;) ein ai/J P(a;) 
i 


, — = 0 »rvals (Dickerson et al,, 196 
where P(a:) are calculated, say, at 5 rae he calcul I), 
he figure of merit gives an estimate of precision of the calculated Phase 
a 4at 2+b ies Va ia 4 es ‘ ~ . 5 
t 


tically interpreted as the cosine of the expected 
tlee a . . 
Obviously, it has a high value When 


2 and it is statis 


as | 
3 


the calculated phase angle. 


In the presence of anomalous scattering data, when Fy and Fp 
are treated as arising from two independent derivatives, the effect of 
anomalous differences on phase determination would only be marginal 
as, for any given reflection, the difference between Fey and Fy is usu- 
ally small. North (1965) has, however, pointed out that the error in the 
anomalous difference for a given reflection would normally be much 
smaller than that in the corresponding isomorphous difference. First 
the former is obviously free from the effects of non-isomorphism. 
Secondly, as Fay and Foy are measured from the same crystal, both 
these quantities are expected to have the same systematic errors. These 
errors are eliminated in the difference between the two quantities. 
Therefore, different estimates of the root mean square error E should 
be used for isomorphous and anomalous differences. Then, for any 
given derivative, the new expression for the probability distribution of 
the protein phase angle in the Presence of anomalous scattering data 
would be 


2 
P;(a) = N; exp {. Eni(a)/2E¢ pexp{ (08-09%, .4112/28)2 } (7) 


where 


u" 
PH? “Hical = 2Fy; Sin(ap-a;7) 


is the corresponding value of the anomalo 
phase angle a and E; is the r.m,s, 
the derivation of (7), Fou is taken 
F5y; and Fy; is approximated to b 
the heavy atom contribution includi 
sion correction. 


us difference calculated for the 


Estimation of E and E! 
lO and BF 


E and E' are two importa 
computing phase angles, Blo 
flections, when Present, 


EA = 4 
d (Fey 2 Fp] - Fy? 7, (8) 
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The values of FE (one for each de rivative) thus obtained can be used for 
acentric reflections ag well. When preliminary estimates of phase 
angles are available and when the number of derivatives is large, E can 
be caleutated as the rams. lack of closure corresponding to ap (Kartha 
Lo76), The rims. error in anomalous differences can also be evalu- | 
ated by similar methods, In general, the value of E' for any given deri- 
vative is about a third of the corresponding value of EF. 
The values of E and E' for each derivative can also be evaluated by 
a different method (Adams, 
well known relations (Ramachandran 


1968) as outlined below using the following 
and Raman, 1956). 


cos a = -(Fp - I py - Fyp)/2F poF uw (9) 
: +2 vag , . 
and sina = (I PH a pryp/ at prt (10) 
\ 
= We obtain what may be called digg if the magnitude 


where a= pry -@ 
of a is determined from (9) and the quadrant from (10). Similarly, we 


itude of a is determined from (10) and quadrant 


obtain Gayo if the magn 
measure of the 


7 Q % . iffere ) . 1¢e : ig 
from (9). The difference between Qjgg and Gano 18 4 
errors present in the data. From (9) we have 

° 


(11) 


2 os 
Fp =I} PH + Py a“ 2F pyFH cosa 
calculated 


Using Gano in (11) we obtain what may be considered as the 
he values of 


value of Fp (F peal): Assuming that all errors lie in Fp, ¢ 
E.can be calculated using the expression 
r2= J |Fp-Fpcatl?/n (12) 
n 
Now assuming Fpy, to be equal to (F by + Fppy)/2: 


= 2F iy sina (13) 


we have from (1 0) 


+ -* 
Fpy- pu 
considered as the cal- 


The value obtained by using Qigg in (13) may be 
The values of E' can 


culated value of the anomalous difference (ANgag}): 


then be evaluated using the expression 
E'l¢@s ) | AH - AHeai|7/n (1 4) 
n 
The values of E can also be evaluated from acen 
use of the values of Fyy estimated by combining isomorphous and a 
malous differences. Kartha and Parthasarathy (1965) and Mathews 
(1966) have given approximate formulae for estimating Fyy the exact 
formula was subsequently derived by Singh and Ramaseshan (1966), The 
estimate of Fy,, however, is not unambiguous. For every reflection. 


tric data by making 
no- 


sible estimates, an upper estimate (Pique) and » ly 


there are two possit | 
ate (Furr). Under normal conditions, FHLE would corre pond | 
z oi Mie : . . 
orrect estimate for most reflections (Dodson and \ 1jayan, Hi}, 
1 . L ; 
j ssuming that all errors lie in | Hr the values of EF can be ¢ lita. 
ted from the expression 
Zs 
E¢= 7 |F 2 l“/n (15 
—-"~ j) 'TFHLE FH! / 5) 
n 


The root mean square error E (and also E' when anomalous diffe. 
rences are used) is an important parameter in phase determination. 
For a given derivative the sharpness of the peak(s) in the probability 
distribution obviously depends on the choice of E. When several deri- 
vatives are used, an overall decrease in the values of E from their cor- 
rect values leads to artificially sharper peaks, the movement of Ap to- 
wards Gy; and deceptively high figures of merit. Opposite effects result 
from an increase in the values of FE. It is also important to see that the 
estimated E in each derivative is a correct measure of the r.m.s error 
tor that particular derivative to ensure the correct relative contribution 
from the derivative to the overall phase probability distribution. 


Suggested modifications to Blow and Crick formulation 
ouppescca modifications to Blow and Crick formulation 


Ashida (1976) has discussed some modifications to the Blow and 
Crick procedure while retaining its essential characteristics in form 
as well as in content. In the first modification, originally proposed by 
Cullis et al. (1961), discussed by him, all the E;'s are assumed to be 
the same and the lack of closure error ri for the ith derivative is 
measured as the distance from the mean of all intersections between 
phase circles to the point of intersection of the phase circle of that de- 
rivative with the phase circle of the native protein. In the second modi- 
fication discussed by him, individual values of Ej for different deriva- 


lives are retained; but the lack of closure is measured from the 
weighted mean of all intersections, 


Another modification, 
Crick formulation, 
(1970). The Blow a 


again within the framework of the Blow and 
was earlier proposed by Hendrickson and Lattman 
nd Crick procedure is based on the relation 


where £7; is the "lumped" 


error, a qe 
Hendrickson and Lattma, ; Ssumed to be Gaussian, in FPHi 


nstead use the relation 


FotFa:l2 ( 
IFetPuil" = Fo aitet, a7) 
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- _n, 

igi Hi is the lumped error, again assumed to be Gaussian, in Fp}jj- 
a rresponding r.m.s. error, E;, can be evaluated using methods 
ccna to those employed for evaluating E;. Hendrickson and Lattman 
point out that whereas the values of E have been shown to be only slightly 
dependent on the measured intensities, the values of E" would neces- 
sarily be functions of structure factor amplitudes. The real advantage 
in using the modified procedure lies in the fact that the exponent in the 
probability expression can then be expressed as a linear combination of 


five terms in the following manner 
n2 u" 
= Epila)/2E; 2=Kj\+Aj cosat By sinatC; cos 2a + D; sin 2a (18) 


whe re Ki, A;, By, Gj and Dj are constants dependant on Fp, Foi Fpui 
andE;. The complete probability distribution of any reflection can thus 
be expressed in terms of five constants. Similar expressions have been 
derived for phase information from anomalous scattering, tangent for- 
mula, partial structure and non-crystallographic symmetry. The phase 
‘nformation from all sources can then be combined by simply taking the 
total value of each constant. Thus, the total probability of the phase 


angle to bea is given by 


P(a) = 7P,(a) = N' expl J Kt ) A, cosat ) Bs sina 
s $ s 
+ } Cg cos 2a + } Ds sin 2a) (19) 
s s 


c. are the constants appropriate to the sth source and 


where Kes, As: et 
N! is the normalisation constant. 


When the total probability of the phase angle being © is represented 


as 
P(a) = mP;(a) » 


the individual probabilities obtained from different derivatives are as~- 
sumed to be independent and hence are multiplied to get the total proba- 
bility. This follows from the assumption that all errors reside in Foy: 
However, in fact, Fp, Fy and Fpy should all be considered in error. 

If the probability related to errors in Fp is denoted by Po, Einstein 
(1977) points out that each P, involves the term Po. Therefore, diffe - 


1 
rent P; are not independent and hence should not be multiplied. The 
effect of multip 


lying them is to give too high a weight to the observed 
Fp. This effect is sought to be eliminated by Raiz and Andreeva (1970) 
and Einstein (1977) by the explicit consideration of errors in Fp as well. 
Following Einstein, the basic principle of their procedure can be illus- 
trated diagramatically as shown in Figure 7. The Harker diagram for 
any given derivative no longer consists of two intersecting circles. The 


parent circle is replaced by the probability distribution P, which is 
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_ ans tween circles of equi-probabilit, 
indicated as a shaded annular regron eee slated to errors in Fpy and 
contours. The probability distribution Fj, Te ~ bab iltey-dite ti 
Fy, is al hown i imilar manner. The joint probability distribu- 
H: 18 also shown ina si : mp. P! is multiplied by a di 
tion Po P; can now be used instead of Pj. pe ee *" h ‘cal 
tribution of the type Pt for each additional Gprehyes M at sao a 
formulae for describing such joint probability distributions have een 
derived by Einstein. A procedure for including anomalous scattering 
data, within the framework of his formulation, has also been described, 


Perhaps the most comprehensive set of modifications to the Blow 


and Crick formulation is that suggested by Green (1979) although the 


treatment is limited to the case of a single derivative. Errors arising 
from imperfect isomorphism, errors in the heavy atom positions and 


those associated with Fpy and Fp are separately considered in this 
treatment. Probability formulae for situations where errors of each 
type or all types are present have been derived. 

been suggested for estimating the r.m.s. value of each type of error. 
Perhaps the most interesting result of Green's analysis is the treatment 
of imperfect isomorphism. The analysis shows that the reliability of 


phase estimation decreases with increasing sin 0/ when isomorphism 
is assumed to be imperfect, 


This is indeed what one would expect on 
physical grounds as the effects of departures fr 
are likely to be important at high resolutio 
to be comparable to the magnitudes of suc 


Methods have also 


om strict isomorphism 


n where the d spacings tend 
h departures. 


en SEN BP ERUCTURES 


Most, but by no means all, 


ot the refinements performed to date on 
rried out at r 


h 


namely, that developed 
8TOUp and that used by 


most others. 
The Munich group has refined 


Sev ‘ 
al. 1974; Deisenhoffer and Steigema,, 1 Protein 


ann, 
Schwager, 1975) using the 


1975; 
Procedure oy PP 
a synthesis with coefficients 


structures (Huber se 
et al. 1975; Bode an 
tlined below. In their method, 
(nF, ~(n-1)F.)exp(ia,) 


(20) 
where Fo is the magnitude of the ob 
8 

a~ are the magnitude and the phage er Structure factor, and Fe and Q 


¢ 
© of the Structure factor calcula! 
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rom the current set of 


coordinates (starting set for the first cycle), is 
cycle of refinement. The model is then fitted to the 
al space refinment procedure developed by 


e 
g coordinates are then used as input para- 


Diamond (1971) 
s for the next cycle of refinement which again involves the calcu- 
with (20) as coefficients and subsequent real 
efinement. The automatic procedure is interrupted, when 
gross errors and to locate solvent molecules 
ntional ditference Fourier (AF) maps with 


, to correct 


ng conve 


(F, -F,) explia,) (21) 


as coefficients. A new cycle of refinement then starts with the correc- 
ted coordinates. The cyclic procedure is stopped when convergence in 


terms of R factor and parameter shifts is reached, and the AF map is 
flat. 


The method employed by most others (Watenpaugh et al. 1973; 
Freer et al. 1975; Moews and Kretsinger, 1975; Chambers and Stroud, 
1977; Stenkamp et al. 1978; Blake et al. 1978; Dodson et al. 1979) is 
essentially the same a8 that used in the refinement of small molecular 
structures. The AF synthesis is cyclically used for correcting position- 
al and thermal parameters. Shifts in positional parameters are calcu- 
lated using the relation 


6x; -=- gradient/curvature = - aA ES (22) 
i 3“0 9X} 


where 6x; is the shift in the parameter Xie The gradients in the OF 
map at the assumed atomic positions are estimated by interpolation of 
different densities at the surrounding grid points. The approximate 
curvatures of different atoms are estimated by one empirical method 
or the other from an electron-density (F,) map. A variety of empirical 
formulae have been used for estimating shifts in thermal parameters: 
all of them naturally seek to increase the B value if the difference den- 
sity at the assumed atomic position is negative and decrease the B value 
if it is positive. As the resolution of the data is limited, the shifted 
parameters do not, in general, lead to acceptable molecular geo- 
metries: Therefore, the protein molecule is regularised, in gator 
cycles: using one of the available automatic methods (Diamond, 19 ‘ 
Hermans and McQueen, 1974; Dodson et al. 1976) to restore molecular 
dimensions to within acceptable limits. 


5 


3 V TA y \N 
-ableme assoc od wit 
It is convenient to discuss some o! the problems a: vigseaaaie @ with 
he Fourier refinement of protein structures in terme of the following 
watirnl nz ee Tf 2a protein structure contains a total of N atoms 
reiicsai anai ysis. sk a prurcin sre . 
(including those in solvent molecules), ce any given stage in the course 
, + a 3 are 
r men nt, he imaccurate positions TP} ot > atoms are known whe re. 
as s0sSitions Pe of the remaining Q atoms are unknown. The true 
pos ns of the ig a atoms may be denoted by Tp; The calculated 
structure factors corresponding to rp; and rb; may be denoted by 
.* 4 ' : - ; > 2 2. ‘oe =] + 
Fexp(iap) and Fyexp(iap), and the magnitude of the observed struc- 
- —~«. . 4 st eee . 
ire factor by Fy. Obviously, Fy, Fp and ap correspond to the con- 
ventional F,, F. anda. respectively. We also define 
Cc c 
N 
> = 2? 
Ss“ = }) &. 
N .+, Nj 
j=l 
2 a 
r - 
S- = i: 23 
=) a Pj (23) 
j=l 
Q 
Zw pe 
and So = J fo: 
jal 


nae fx - ee factor for atom Xj. fy; is assumed to have 
he eae e i. t . eo factor, Obviously, N=P+Q and 

N~°P* “Q° en, tollowing the methods developed by Ramachandran 
and Srinivasan (1970), it can be*shown (Vijayan, 1980) that a general 
Fourier synthesis with coefficients 


(mF yy - nFp) exp(iay) (24) 


has the following peak positions and strengths 


mSy-nSp 
ees 


-{ 
*Pj . 25) 
J Sp fp; ( 


-1t -1 -1 
fast? -r 
P l oa bas 
j “Pk *P asp fpifpyfp (26) 


(j #1) 
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r ) = + Th) k = TQ) de 
) Vik — — { ; - 
| 2SySp P)'QKfQ1 en 
(k#1) 
on - 
Tp: rp) -Tr m 1 I 
; ¢ {p;f 
al Sysp ‘PifPKfar (28) 
r ; =9 = 
“Pj 'Pk7 TP] 35-5— fpjfrd 
Pj'Pk : 
ey PiPlP (29) 
(j#1) 
- wf - 
"Qj*'PK- TP — fajfpKipy I 


2s NP 


The above distribution of peaks can be considered as consisting of shif- 
ted vector sets centred around different true or assumed atomic posi- 
tions, some with origin peaks and some without. Terms (27) and (28), 
and terms (26), (29) and (30) when k#1 give rise to general background. 
(26), (29) and (30) give rise to peaks at assumed and true atomic posi- 
tions when k=] and when the errors in the positions of the known P atoms 
are small and random. Under such conditions and assuming N and Pto 
be large, which is true in the case of proteins, the peaks at atmoc posi- 
tions will have the following strengths. 


TPj 2Sp fp; (31) 
r Fp? ==~ !p; (32) 
Tpj +< (Tp, rp) - 25 Pj 
(J #k) 
mSp aa 


TQ;+ dtpK-FPK)> “2S *Q3 


Th ventional AF map results when m=n=1; likewise conventional 

"marie Its when m=1 and n=0. It can be readily shown that the 
a Sees of these maps (Luzzati, 1953; Dodson and Vijayan, 
renee sal from the above three expressions. 


Parame trisation 


i co- 
‘ndicated earlier, two types of syntheses, one with ei oe 
ete a the other the AF synthesis, have been used in the ncn 
ie tein structures. When using the former, one 8© 


i istribution in the 
i tron-density distribution In 
tation of the elec i a Bevectively 


efficient 
refinement of pro 

in a true represen fag 
par with peak strengths of fpj and £9; at rpj 
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=s is i i hieved in a synthes;j 
and zero density at Tp}: This is readily achieve ynthesis with 
: exp (i (34 
(mF, nF.) exp(ia¢) ) 
as coefficients when 


2 2 
m= 2Sy\/Sp and n= Sx/Sp (35) 


The above synthesis is identical, except for a scale factor, to a synthe- 
Sis with 


(KF, - F,) exp(ia,) (36) 
as coefficients. The best results are obtained from this synthesis when 


k =m/n = 2Sp/Syq . (37) 
Thus the theory outlined earlier provides a rationalisation for syntheses 
with a linear combination of F, and F. as the magnitude of the coeffici- 
ents; it also leads to the determination of the optimum values of the 
parameters (m,n or k) to be used, The parametrisation can be made 


still more effective by using empirically evaluated values of Sy and Sp 
as will be shown later. 


The AF synthesis is most effective when P i 
The peak strengths at rp:, F 
ip, /2 and {9 j/2 respectively wh 


peaks with equal magnitudes and Opposite signs at ru: and Fp: combine 


de use of (as in’(22)) to evaluate 
Practice P and N differ significantly 
sar (ke in the ve ey final stages of refinement and, consequently, peaks 
at Tj and rp; will have reduced and unequal strengths in the AF synthe- 


sis. However, peak strengths at the atomic positions expected in a AF 
map when PN can be reproduced ina map with 


(m'F, -n'F.) exp(ia 


c) (38) 


as coefficients if the parameters are chosen as 


] 
m' =SN/Sp and n! = 2+SK/sh (39) 
Here again the synthesis can be made more effective by the use of emp!” 
rically evaluated Sp and Sn (see later), y 


re ee & Sp and Sy 
vm the structure o 
eee ructure of most small molecules, the definition of 
— see _ iN protein crystals vary substantially from one region of 
- galt ctr init te another, [6 seaeral, most ef the mate chale 
a - toms belonging to internal side chains are well defined 
‘~. perature factors", The atoms belonging to surface resi- 
me amvent molecules are usually poorly defined. They are often 
ssociated with high temperature factors arising from large thermal 


Oration amplitudes as well as static disorder corresponding to diffe- 
rent structural or conformational possibilities. Consequently these 
e associated with weak and diffuse electron-densities. The 
itions of the well-defined atoms are most often located in the early 


ses of refinement and attempts are then made to locate poorly de- 


fined atoms. Therefore, at any given stage inthe course of the refine- 
ment, the scattering power of the P known atoms are likely to be diffe - 


rent from those of the remaining Q atoms. Thus, Sy and S& would not 
he true scattering power of the complete structure and 
of the structure respectively unless the form factor of 
each atom properly corrected for temperature factor. Some estimate 
of the temperature factors of the known atoms may exist; no such esti- 


ose of the unknown atoms will be available. Therefore, it is 
g Wilsonian dis- 


-- 
Oo 
on 


mate for th 
advisable to evaluate Sy and Sp empirically. Assumin 
tribution of intensities, this can be done by replacing Sy and Sp by 
cE >and <F5> respectively. 

Yet another factor that need to be considered is the differential 
on of the P and the Q atoms to intensities at different Bragg 
If the O ators belong mostly to surface residues and solvent 
which is usually the case, their contribution, though high at 
is likely to decrease rapidly with increasing Bragg 
rential fall off of the scattering powers of the known 
and the unknown parts of the structure can be taken care of by dividing 
the reciprocal space mto a convenient number of spherical shells with 
increasing radii (4 sin’ 6/) 2) and then evaluating SN (=< Fy >) and Sp 
(=<Fp>) in each shell separately. One can then obtain a curve for each 

') as a function of Bragg angle. For each 


i] 
pavameter (m,n k,m' orn ! 
reflection oe values of the parameters for the corresponding Bragg 


can be used to construct the appropriate Fourier coefficient. 


contributi 
angles. 
molecules, 
low resolution, 
angle. The diffe 


angle 


f inner reflections 


Treatment o 
ed with the treatment 
he 6A sphere) have be 


of reflections with very low 
sion joagnanmrt en discussed in most of the 


Bragg angles (say. int 
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. R res, These are the re flac. 
ports on the refinement of protein structure a 
re 


c © ' f disorde red Bol ye nt re. 
- . the presence o 
i t seriously affected by 3 
aa in th nnd Many of these solvent molecules, if not mops of 
a ei dea ; s ; — ‘ factor j la. 
th n, are not, or cannot be, included in the structure factor caleuty 
em, ’ . Pgs ir oe t _ , 
tions. Their scattering power at high angles are likely to be srnal) op, 
. ” : - “| to reflectiy 
= P . ‘r Their contribution lone 
ount of thermal or static disorder. : Nu 
acc be substantial. Therefore, the 
at very low angles would, however, be subs 


effect of the unknown part of the structure, made up substantially of 
solvent molecules, is most pronounced for the inner feflections, Thus, 
large discrepancies between the observed and the celauheted atructure 
factors are expected for these reflections. The observed discrepancies, 
however, appear to be systematic. In almost all cases, the calculated 


structure factors are reported to be much greater than the observed 
Ones, 


A satisfactory, though qualitative, explanation of this phenomenon, 
based on Babinet's principle, has been given by Moews and Kretsinger 
(1975). If the Scattering matter ig uniformly distributed in the unit cell, 
the vector sum of the scattering 
cell and that from the remainder 
(sin 6/ \=0) must be identically e 
can be considered to be uniform] 


amplitude from one region of the unit 
of the cell in the forward direction 
qual to zero, The electron-density 
y distributed at low resolution and the 
Therefore, the contributions to the 


at this resolution; 
that are important 


Most workers choose to 


calculations. It is not Perha flections ‘rom refinement 


mit them altogether, e5- 
ht to be determined, as 
Beneral features of solvent 
ed that even when these 
ween Fp and Fy in each of - a 
Fo 1 The non-random natur 
rg Napa to larger differences betwee" 
ation et Structure factors ee 
8 between Fp and Fo are randomly 
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. : . car coefficients for these reflect} a 
stributed. Therefore, the Fourier coefficients lections 


: 
1 ‘ : j 

Anita tashaoe ba-nives lower eights when computing maps with (21), 
(34) or (38) as coefficients. This is automatically achieved in (34) and 
(38) when the parameters m,n, m! and n! are calculated using empiri. 
cally evaluated Sp and Syn. 


Reliability of results and checking procedures 


When good data at atomic resolution are available, as is normally 
the case with small structures, automatic refinement procedures, by 
and large, lead to reliable results. The refinement of protein struc- 
tures is complicated by several factors, the chief among which are the 
limited nature of the data set, and the size and complexity of the pro- 
tein. Also, the definition of the structure, and hence the accuracy of 
the results, vary considerably from one region of the structure to ano- 
ther depending on the flexibility of the group involved. Therefore, the 
reliability of the refined parameters need to be considered carefully. 


Many authors have attempted to estimate errors by theoretical 
means, for example, using Luzzati diagram (Luzzati, 1952) or Cruick- 
shank's equations (Cruickshank, 1949), Errors have also been esti- 
mated from the population variance in bond lengths before regularisa- 
tion. Comparison of the refined dimensions of chemically equivalent 
but crystallographically independent molecules in the same crystal or 
different crystals is yet another method employed for estimating errors. 
In this method, however, the observed differences contain contributions 
from errors as well as genuine differences arising from differences in 
intermolecular interactions. 


. Perhaps the most interesting results pertaining to the reliability of 
refined coordinates are those obtained during the refinement of 2Zn in- 
sulin crystals (Dodson et al.1979). The structure was refined simulta- 
neously by the difference Fourier method in Professor Dorothy Hodgkin's 


at 1.5A resolution were used in 
urse of both the refinements, auto” 
Pted to add, delete or modify parts 
olecules, Each refinement produce 
rnally consistent and also led to 
rved and calculated structure facto™® 


both the refinements. During the co 
matic procedures were often interru 
of the structure including solvent m 
a set of coordinates which was inte 
acceptable agreement between obse 
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R<0.20 for pare Fala 
oe ee 
: Z -2 44 for a majority of protein 
atoms. The re were, however, larger descrepancies in the position of 
the remaining protein atoms, which all belonged to surface residues 
and water molecules. In most instances, the descrepancies Seen 
ponded to differences in detail, but ina few cases they represented 
gross differences in side chain conformation. Thus, the agreement 
between the two sets of coordinates, though good for the residues in 
the interior of the molecule, was on the whole rather disappointing. 
The descrepancies in the results were subsequently resolved and the 
correct positions arrived at through a detailed manual examination of 
different types of Fourier maps, including those in the earlier stages of 
refinement, coupled with geometrical and chemical considerations. 


The experience cited above clearly shows that the usual crystallo- 
graphic indicators for the convergence of refinement do not assure that 


the refined parameters, especially those pertaining to ill-defined struc- 


ture, are necessarily correct. Often an interpretation of diffuse den- 


sity, even when essentially wrong, does not get automatically corrected 


during the course of the refinement. This emphasises the need for 


thorough periodic checks on the current set of coordinates. One of the 
methods employed for this purpose consists in leaving out part of the 
structure from the structure factor calculations and then checking the 
atomic coordinates in that part of the structure against the subsequent 
difference Fourier map. As an example for this procedure, one set of 
calculations carried out towards the end of the refinement of 2Zn insulin 
In this set of calculations, the asymmetric unit 

s divided into eight segments along the c axis. 
To start with, all the atoms in the first segment (0 to c/8) were removed 
from structure factor calculations and a difference Fourier map was 
computed covering only this segment. The operation was repeated for 
the remaining seven segments as well. Compiling the separately com 
puted difference densities in the eight segments, one obtained a map in 
which the density in any given segment was formally independent of the 
input atoms in that segment. The abrupt changes in density which, for 
obvious reasons, occurred at the boundaries between adjacent segments 
made the interpretation in the neighbourhood of the boundaries rather 
difficult. Therefore, another set of difference Fourier maps was also 
computed with segments -c/16 to c/16, c/16 to 3c/16 etc. The two sets 
of difference Fourier maps provided a valuable check on the current set 
of coordinates. A region in one of the difference Fourier maps and the 
same region in the Fy map are shown in Figure 8(a) and (b) respectively 


for comparison. 


may be outlined here. 
(0-a/3, 0-b/3, O-c) wa 


i i theses are not 
ar that the different types of Fourier syn 
Lemons xpect in indicating the errors in the 


as effective as one would normally e 
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cies ie the — nm atoms and iin giving the correct positions of the 
a ny matoms. When reconciling the two sets of coordinates in 2Zn 
res agree arrived at through different refinement procedures, it appeared 
— ee re "remembered" their previous history even when they 
were not included in th 


1 the calculation of phase angles. One explanation 
a n n might be related to the effect of errors in the posi- 
tions of the atoms on the corrected positions of the known atoms 

f unknown atoms obtained from Fourier syntheses. 
(31) to (33) were derived from (25) to (30) on the 
tat the errors in ry. were small and random. It can be 

y shown that non-random errors lead to shifts in peak positions 
from Tp, and fr 


7 


in Fourier ma 


oO 


Qi When the errors are systematic and large, features 
ps'can no longer be divided in a simple manner into those 
contributing to peaks at atomic positions and those contributing to back- 


ground. What was considered earlier as background is also then likely 


to be important in determining peak positions. It is also clear from the 


analysis that there is no inherent mechanism for automatically correct- 
ing the positional errors (in Fourier maps) resulting from non-random 
errors in the positions of the input atoms. It is therefore important to 
carry out careful manual examination of different Fourier maps at 


various stages of refinement to make sure that large systematic errors 
are not introduced in the input coordinates. 


The author thanks the University Grants Commission, India, for 
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EXERCISES 


Determine a, for reflections with the following set of parameters 


, r : 
using Harker diagrams. 


1) F. = 55, 
Fy} = 20 ayy] = 84°, F pH] = 67, 
FH = 30, ap = -14.5°, Fria = 84: 
2) Fy, = 50, 
Fry = 20, apy = 107°, Fey = 2.5; 
Fpby = Fpy = 70 
3) Fy = 50 
° a 
FH = 20, apy = 197 , Fy = 2 ay 
Foy = 51-5, Fpy = 50 


II. Compute mand ag for the following reflections using eqns. (2) to (6) 


in the text. 


= 68 


1) 54, Fry] = 19, Ory] = 80°, Fou) 
= 3 


. Qpyy = 712°, Fpy2 = 77 
a) E,=10, E2=20; (b) F)=4, E, =8 


2) F,= 56, Fy = 20, 94) = 95°, Fp = 70 
Fy = 20, ayy2 = =90, F pp? = 75 


a) E) = 10, Ep = 20; (b) EF; = 4, E, =8 


Answers 
peledibabaaltth ited 


I. 1. a. 21° 
2. an 107° 
° 
3 a, = 107 
Il. 1.(a). m= 0.695, ap = 45.4° 
(b). m= 0.952, dp = 38.0° 
2.(a). m# 0.484, ap = 159, 1° 


(b). m= 0.548, ap = 154.8° 


